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Abstract 

We describe a statistical hypothesis test for the presence of a signal based on 
the likelihood ratio statistic. We derive the test for one case of interest and 
also show that for that case the test works very well, even far out in the tails of 
the distribution. We also study extensions of the test to cases where there are 
multiple channels. 

1 Introduction 

In recent years much work has been done on the problem of setting limits in the presence of nuisance 
parameters, beginning with the seminal paper by Feldman and Cousins HI. A fairly comprehensive 
solution of this problem was given in Rolke, Lopez and Conrad [2]. In this paper we will study a related 
problem, namely that of claiming a new discovery, say of a new particle or decay mode. Statistically 
this falls under the heading of hypothesis testing. We will describe a test derived in a fairly standard way 
called the likelihood ratio test. The main contribution of this paper is the study of the performance of this 
test. This is essential for two reasons. First, discoveries in high energy physics require a very small false- 
positive, that is the probability of falsely claiming a discovery has to be very small. This probability, in 
statistics called the type I error probability a, is sometimes required to be as low as 2.87-10^^, equivalent 
to a 5a event. The likelihood ratio test is an approximate test, and whether the approximation works this 
far out in the tails is a question that needs to be investigated. Secondly, in high energy physics we can 
often make use of multiple channels, which means we have problems with as many as 30 parameters, 20 
of which are nuisance parameters. The sizes of the samples needed to insure that the likelihood ratio test 
works need to be determined. 

2 Likelihood Ratio Test 

We will consider the following general problem: we have data X from a distribution with density /(x; 0) 
where is a vector of parameters with 9 ^ Q and Q is the entire parameter space. We wish to test the 
null hypothesis Hq : 9 € Qq (no signal) vs the alternative hypothesis. Ha : 6* G 0q (some signal), where 
00 is some subset of G. The likelihood function is defined by 

L(0|x) = /(x;0) 

and the likelihood ratio test statistic is defined by 
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Intuitively we can understand the statistic in the case of a discrete random variable. In this case the nu- 
merator is the maximum probability of the observed sample if the maximum is taken over all parameters 
allowed under the null hypothesis. In the denominator we take the maximum over all possible values of 
the parameter. The ratio of these is small if there are parameter points in the alternative hypothesis for 
which the observed sample is much more likely than for any parameter point in the null hypothesis. In 
that case we should reject the null hypothesis. Therefore we define the likelihood ratio test to be: reject 
the null hypothesis if A(x) < c, for some suitably chosen c, which in turn depends on the type I error 
probability a. 



How do we find c? For this we will use the following theorem: under some mild regularity 
conditions if G ©q then — 21ogA(x) has a chi-square distribution as the sample size n ^ oo. The 
degrees of freedom of the chi-square distribution is the difference between the number of free parameters 
specified by 6* G 0o and the number of free parameters specified by 6* G 0. 

A proof of this theorem is given in Stuart, Ord and Arnold f3 ] and a nice discussion with examples 
can be found in Casella and Berger |j4l. 



3 A Specific Example: A Counting Experiment with Background and Efficiency 

We begin with a very common type of situation in high energy physics experiments. After suitably 
chosen cuts we find n events in the signal region, some of which may be signal events. We can model 
n as a random variable N with a Poisson distribution with rate es + h where h is the background rate, 
s the signal rate and e the efficiency on the signal. We also have an independent measurement y of the 
background rate, either from data sidebands or from Monte Carlo and we can model y as a Poisson with 
rate rb, where r is the relative size of the sidebands to the signal region or the relative size of the Monte 
Carlo sample to the data sample, so that y/r is the point estimate of the background rate in the signal 
region. Finally we have an independent measurement of the efficiency z, usually from Monte Carlo, 
and we will model 2; as a Gaussian with mean e and standard deviation cje. So we have the following 
probability model: 

N ~ Pois{es + h) F ~ Pois{Tb) Z ~ (e, cJe) 

In this model s is the parameter of interest, e and b are nuisance parameters and r and are assumed to 
be known. Now the joint density of N, Y and Z is given by 

f{n,y,z;e,s,b) = ^ r^e i^^+^^^-^e — e 

Finding the denominator of the likelihood ratio test statistic A means finding the maximum likelihood 
estimators of e, s, b. They are given by s = n — y/r, b = y/r and e = z. 

We wish to test Hq : s = vs Ha : s > 0, so under the null hypothesis we have 

log /(n, y, z- 0, b,e) =n log (6) - log(n!) - b+ 
ylog(r6) -log(y!) - (rb) - ilog(27ra2) - 

and we find that this is maximized for b = and e = z. Now 
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One special case of this needs to be do be studied separately, namely the case y = 0. In this case we can 
not take the logarithm and the maxima above have to be found in a different way. It turns out that the 
MLE's axe's = n, b = ,e = z, and under the null hypothesis we find 5 = and e = z. With this we 
find A(n, 0, z) = (l + r)-". 

First we note that the test statistic does no involve z, the estimate of the efficiency. This is actually 
clear: the efficiency is for the detection of signal events, but under the null hypothesis there are none. Of 
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course the efficiency will affect the power curve: if e is small the observed n will be small and it will be 
much harder to reject the null hypothesis. 

Now from the general theory we know that —2 log A(A^, Y, Z) has a chi-square distribution with 1 
degree of freedom because in the general model there are 3 free parameters and under the null hypothesis 
there are 2. So if we denote the test statistic by L(n, y) we get 



Lin,y) = -2log X{n,y,z) 



if y > 



I 2 log(n) + y log{y) - (n + y) log j - y log(r) 
\ 2nlog(l + r) ify = 

and we have L{N, Y) ~ Xv approximately. 

Obviously we will only claim a disovery if there is an excess of events in the signal region, and so 
the test becomes: reject Hq if n > y/r and L{n, y) > c. Now it can be shown that c = qxii^ — 2a), 
the (1 — 2a) quantile of a chi-squared distribution with one degree of freedom. 

The situation described here has previously been studied in Rolke, Lopez and Conrad IS in the 
context of setting limits. They proposed a solution based on the profile likelihood. This solution is 
closely related to the test described here. In fact it is the confidence interval one finds when inverting the 
test described above. 



4 Multiple Channels 

In high energy physics we can sometimes make use of multiple channels. There are a number of possible 
extensions from one channel. We will consider the following model: there are k channels and we have 
Ni ~ Pois{eiSi + bi), Yi ~ Pois{Tibi), i = l,..,k, all independent. We will again find that the 
efficiencies do not affect the type I error probability. We will discuss two ways to extend the methods 
above to multiple channels, both with certain advantages and disadvantages. 



4.1 Method 1: (Full LRT) 

We can calculate the likelihood ratio statistic for the full model. It turns out that the test statistic is 
given by 

k 

Lk{n,y) =^L{ni,yi)I{ni > yi/ri) 

i=l 

where / is the indicator function, that is /(n > y/r) = 1 if n > y/r, and otherwise. In other words 
the test statistic is simply the sum of the test statistics for each channel separately. The test is then as 
follows: we reject Hq if Lfc(n, y) > c. It can be shown that the distribution of the test statistic under the 
null hypothesis is a linear combination of chi-square distributions. Tables of critical values as well as a 
routine for calculating them are available from the authors. 



4.2 Method 2: (Max LRT) 

Here we will use the following test: reject Hq if M = maxj{L(nj, yj)/(nj > yi/ri} > c, that is, 
we claim a discovery if there is a significant excess of events in any one channel. For this method the 
critical value c is found using Bonferroni's method. We therefore reject Hq if M > c, where c = 

(zx?(i-2(i-^r^)). 

As we shall see soon, which of these two methods performs better depends on the experiment. 



5 Performance 

How do the above tests perform? In order to be a proper test they first of all have to achieve the nominal 
type I error probability a. If they do we can then further study their performance by considering their 



power function /3(s) given by 

/?(s) = P(reject Hq] tme signal rate is s) 

Of course we have a = /3(0). gives us the discovery potential, that is the probabihty of correctly 
claiming a discovery if the true signal rate is s > 0. 

In simple cases the true type I error probabihty a and the power P{s) can be calculated explicitly, 
in more difficult cases we generally need to use Monte Carlo. Moreover, if Monte Carlo is used a 
technique called importance sampling makes it possible to find the true type I error probabihty even out 
at 5cr. 

First we will study the true type I error probabihty as a function of the background rate. In figure 
1 we calculate a (expressed in sigma's) for background rates ranging from 6 = 5 to 6 = 50. Here we 
have used r = 1 and a corresponding to 3a, Aa and 5(t. 

It is clear that even for moderate background rates (say b > 20) the true type I error is basically 
the same as the nominal one. For smaller background rates, the method is conservative, that is, the true 
significance of a signal is actually even higher than the one claimed, and it is therefore safe to use the 
method even for small b. 

In figure 2 we have the power curves for 6 = 50, r = 1, e = 1, s from to 100 and a correspond- 
ing to 3(7, 4(7 and 5(7. This clearly shows the "penalty" of requiring a discovery threshold of 5a: at that 
level the true signal rate has to be 83 for a 90% chance of making a discovery. If 3a is used a rate of 52 
is sufficient, and for Aa it is 67. 

Let us now consider the case of multiple channels. In figure 3 we have the results of the following 
simulation: There are 5 channels, all with the same background, going from 10 to 100, and the same 
T = 1. Again we see that the test achieves the nominal a even for small background rates. 

For the last study we will compare the two methods for multiple channels. In figure 4 we have the 
power curves for the following situations: we have 5 channels with 6 = 50, e = 1, and r = 1 for all 
channels. In case 1 the signal rate s goes from to 75 and is the same in all channels. In case 2 we have 
si going from to 100 and S2 = ■■ = S5 = 0. All simulations are done using a = 5a. Clearly in case I 
Full Lrt does better whereas in case 2 it is Max Lrt. 

This is not surprising because the maximum makes this method more sensitive to the "strongest" 
channel whereas the sum makes Full Lrt more sensitive to a "balance" of the chaimels. In practice, of 
course, a decision on which method to use has to be made before any data is seen. A discussion of the 
optimum sttategy for making such a decision is beyond the scope of this paper. 

6 Further Extensions 

Our extension to multiple channels assumes possibly different signal rates in each channel. The most 
common situation involves different decay channels of a particle whose existence is being tested. In that 
case, the different signal rates are due to different branching ratios such that si = ris with a common 
s. A detailed discussion of this case along with the inclusion of information on certain variables in each 
event (a technique generally known as marked Poisson) will be found in an upcoming paper. 

7 Summary 

We have discussed a hypothesis test for the presence of a signal. For the case of a Poisson distributed 
signal with a background that has either a Poisson or a Gaussian distribution we have carried out the 
calculations and done an extensive performance study. We have shown that the test achieves the nominal 
type I error probability a, even at a 5a level. We extended the test to the case of multiple chaimels 
with two possible tests and showed that both achieve the nominal a. Either one or the other has better 
performance depending on the specific experiment. 
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Fig. 1: Type I error probability a for different values of the background rate h 
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Fig. 2: Power of Test for 6 = 50, r = 1 
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Fig. 3: Type I error probability a for different values of the background rate b for the 5 channel case. 
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Fig. 4: Power of two methods with 5 channels. Case 1 has equal signal in all channels, case two has signal in 
channel one and no signals in the others. 
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